Goto

Collaborating Authors

 target response


MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

arXiv.org Artificial Intelligence

As dialogue systems become increasingly important across various domains, a key challenge in persona-based dialogue is generating engaging and context-specific interactions while ensuring the model acts with a coherent personality. However, existing persona-based dialogue datasets lack explicit relations between persona sentences and responses, which makes it difficult for models to effectively capture persona information. To address these issues, we propose MoCoRP (Modeling Consistent Relations between Persona and Response), a framework that incorporates explicit relations into language models. MoCoRP leverages an NLI expert to explicitly extract the NLI relations between persona sentences and responses, enabling the model to effectively incorporate appropriate persona information from the context into its responses. We applied this framework to pre-trained models like BART and further extended it to modern large language models (LLMs) through alignment tuning. Experimental results on the public datasets ConvAI2 and MPChat demonstrate that MoCoRP outperforms existing baselines, achieving superior persona consistency and engaging, context-aware dialogue generation. Furthermore, our model not only excels in quantitative metrics but also shows significant improvements in qualitative aspects. These results highlight the effectiveness of explicitly modeling persona-response relations in persona-based dialogue. The source codes of MoCoRP are available at https://github.com/DMCB-GIST/MoCoRP.


Rethinking Cross-lingual Gaps from a Statistical Viewpoint

arXiv.org Artificial Intelligence

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried from target languages. Prior research has pointed to a cross-lingual gap, viz., a drop in accuracy when the knowledge is queried in a target language compared to when the query is in the source language. Existing research has rationalized divergence in latent representations in source and target languages as the source of cross-lingual gap. In this work, we take an alternative view and hypothesize that the variance of responses in the target language is the main cause of this gap. We present extensive experimental evidence which support proposed formulation and hypothesis. We then reinforce our hypothesis through multiple inference-time interventions that control the variance and reduce the cross-lingual gap. We demonstrate a simple prompt instruction to reduce the response variance, which improved target accuracy by 20-25% across different models. Large Language Models (LLMs) have revolutionized information access. Central to LLM's mission is to assimilate knowledge universally and make it available generally without any barriers. State-of-art LLMs are multilingual: Gemini supports over 40 languages (Gemini, 2025), GPT -5 supports at least 12 languages (GPT, 2025) (with no official number of supported languages) and open-source models like Gemma-3 support over 100 spoken languages (Gemma, 2025). Because pretraining data cannot contain duplicate information for every language, cross-lingual generalization is a necessary capability for LLMs. However, LLMs are known to have disparity in recalling knowledge across languages (Jiang et al., 2020; Kassner et al., 2021; Qi et al., 2023; Chua et al., 2024a; Goldman et al., 2025). Our objective is to understand the causes of poor transfer of knowledge encoded in parameters across languages. We, therefore, evaluate models on knowledge-intensive tasks in a closed-book QA setting, i.e., without access to such tools as grounding in search. Cross-lingual gaps are quantified through disparity on parallel datasets that alter language-specific surface form of the prompts.


Inverting Trojans in LLMs

arXiv.org Artificial Intelligence

While effective backdoor detection and inversion schemes have been developed for AIs used e.g. for images, there are challenges in "porting" these methods to LLMs. First, the LLM input space is discrete, which precludes gradient-based search over this space, central to many backdoor inversion methods. Second, there are ~30,000^k k-tuples to consider, k the token-length of a putative trigger. Third, for LLMs there is the need to blacklist tokens that have strong marginal associations with the putative target response (class) of an attack, as such tokens give false detection signals. However, good blacklists may not exist for some domains. We propose a LLM trigger inversion approach with three key components: i) discrete search, with putative triggers greedily accreted, starting from a select list of singletons; ii) implicit blacklisting, achieved by evaluating the average cosine similarity, in activation space, between a candidate trigger and a small clean set of samples from the putative target class; iii) detection when a candidate trigger elicits high misclassifications, and with unusually high decision confidence. Unlike many recent works, we demonstrate that our approach reliably detects and successfully inverts ground-truth backdoor trigger phrases.


A suite of LMs comprehend puzzle statements as well as humans

arXiv.org Artificial Intelligence

Recent claims suggest that large language models (LMs) underperform humans in comprehending minimally complex English statements (Dentella et al., 2024). Here, we revisit those findings and argue that human performance was overestimated, while LLM abilities were underestimated. Using the same stimuli, we report a preregistered study comparing human responses in two conditions: one allowed rereading (replicating the original study), and one that restricted rereading (a more naturalistic comprehension test). Human accuracy dropped significantly when rereading was restricted (73%), falling below that of Falcon-180B-Chat (76%) and GPT-4 (81%). The newer GPT-o1 model achieves perfect accuracy. Results further show that both humans and models are disproportionately challenged by queries involving potentially reciprocal actions (e.g., kissing), suggesting shared pragmatic sensitivities rather than model-specific deficits. Additional analyses using Llama-2-70B log probabilities, a recoding of open-ended model responses, and grammaticality ratings of other sentences reveal systematic underestimation of model performance. We find that GPT-4o can align with either naive or expert grammaticality judgments, depending on prompt framing. These findings underscore the need for more careful experimental design and coding practices in LLM evaluation, and they challenge the assumption that current models are inherently weaker than humans at language comprehension.


Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

arXiv.org Artificial Intelligence

Training LLMs on data containing unfamiliar knowledge during the instruction tuning stage can encourage hallucinations. To address this challenge, we introduce NOVA, a novel framework designed to identify high-quality data that aligns well with the LLM's learned knowledge to reduce hallucinations. NOVA includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data. Specifically, ICP evaluates the LLM's understanding of the given instruction by calculating the tailored consistency among multiple self-generated responses. SEI further assesses the familiarity of the LLM with the target response by comparing it to the generated responses, using the proposed semantic clustering and well-designed voting strategy. Finally, to ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity. By considering data quality and avoiding unfamiliar data, we can utilize the selected data to effectively align LLMs to follow instructions and hallucinate less.


Real-Time Textless Dialogue Generation

arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) have led to significant progress in text-based dialogue systems. These systems can now generate high-quality responses that are accurate and coherent across a wide range of topics and tasks. However, spoken dialogue systems still lag behind in terms of naturalness. They tend to produce robotic interactions, with issues such as slow response times, overly generic or cautious replies, and a lack of natural rhythm and fluid turn-taking. This shortcoming is largely due to the over-reliance on the traditional cascaded design, which involve separate, sequential components, as well as the use of text as an intermediate representation. This paper propose a real-time, textless spoken dialogue generation model (RTTL-DG) that aims to overcome these challenges. Our system enables fluid turn-taking and generates responses with minimal delay by processing streaming spoken conversation directly. Additionally, our model incorporates backchannels, filters, laughter, and other paralinguistic signals, which are often absent in cascaded dialogue systems, to create more natural and human-like interactions. The implementations and generated samples are available in our repository: https://github.com/mailong25/rts2s-dg


Dialogue Language Model with Large-Scale Persona Data Engineering

arXiv.org Artificial Intelligence

Maintaining persona consistency is paramount in the application of open-domain dialogue systems, as exemplified by models like ChatGPT. Despite significant advancements, the limited scale and diversity of current persona dialogue datasets remain challenges to achieving robust persona-consistent dialogue models. In this study, drawing inspiration from the success of large-scale pre-training, we introduce PPDS, an open-domain persona dialogue system that employs extensive generative pre-training on a persona dialogue dataset to enhance persona consistency. Specifically, we present a persona extraction model designed to autonomously and precisely generate vast persona dialogue datasets. Additionally, we unveil a pioneering persona augmentation technique to address the invalid persona bias inherent in the constructed dataset. Both quantitative and human evaluations consistently highlight the superior response quality and persona consistency of our proposed model, underscoring its effectiveness.


Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation

arXiv.org Artificial Intelligence

Empathetic response generation, aiming at understanding the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Previous methods mainly focus on using maximum likelihood estimation as the optimization objective for training response generation models, without taking into account the empathy level alignment between generated responses and target responses. To this end, we propose an empathetic response generation using reinforcement learning (EmpRL) framework. The framework designs an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. Given the powerful text generation capability of pre-trained language models, EmpRL utilizes the pre-trained T5 model as the generator and conducts further training to initialize the policy. To align the empathy level between generated responses and target responses in the context, an empathy reward function containing three empathy communication mechanisms, i.e., emotional reaction, interpretation, and exploration, is constructed using pre-designed and pre-trained empathy identifiers. Finally, the proximal policy optimization algorithm is used to further train the policy to produce empathetic responses. Both automatic and manual evaluations demonstrate that the proposed EmpRL framework can improve the quality of generated responses, enhance the empathy level similarity between generated and target responses, and produce empathetic responses covering both affective and cognitive aspects.


Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with potentially untrusted content are vulnerable to a new class of denial-of-service attacks we call jamming. An adversary can add a single ``blocker'' document to the database that will be retrieved in response to a specific query and, furthermore, result in the RAG system not answering the query - ostensibly because it lacks the information or because the answer is unsafe. We describe and analyze several methods for generating blocker documents, including a new method based on black-box optimization that does not require the adversary to know the embedding or LLM used by the target RAG system, nor access to an auxiliary LLM to generate blocker documents. We measure the efficacy of the considered methods against several LLMs and embeddings, and demonstrate that the existing safety metrics for LLMs do not capture their vulnerability to jamming. We then discuss defenses against blocker documents.


Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

arXiv.org Artificial Intelligence

Abstract--With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance. Regarding the new task of universal goal hijacking, previous efforts have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To fill this gap, we propose a universal goal hijacking method called POUGH that incorporates semantic-guided prompt processing strategies. Specifically, the method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes the prompts. Once the prompts are organized sequentially, the method employs an iterative optimization algorithm to generate the universal fixed suffix for the prompts. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness of our method.